(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 

International Bureau 




llllilll l« 



(43) International Publication Date (10) International Publication Number 

15 March 2001 (15.03.2001) PCT WO 01/19013 Al 



(51) International Patent Classification 7 : H04L 1/06 

(21) International Application Number: PCT/USOO/24641 

(22) International Filing Date: 

8 September 2000 (08.09.2000) 

(25) Filing Language: English 

(26) Publication Language: English 

(30) Priority Data: 

60/152,982 9 September 1999 (09.09.1999) US 

(71) Applicant: HOME WIRELESS NETWORKS, INC. 

[US/US]; 3145 Avalon Ridge Place, Norcross, GA 30071 
(US). 

(72) Inventor: ARIYAVISITAKUL, Sirikiat, L.; 875 Mount 
Katahdin Trace, Alpharetta, GA 30022 (US). 



(74) Agents: PRATT, John, S. et al.; Kilpatrick Stockton LLP, 
Suite 2800, 1100 Peachtree Street, Atlanta, GA 30309- 
4530 (US). 

(81) Designated States (national): AE, AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CR, CU, CZ, 
DE, DK, DM, DZ, EE, ES, FI, GB, GD, GE, GH, GM, HR, 
HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR, 
LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, MZ, 
NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, 
TR, TT, TZ, UA, UG, UZ, VN, YU, ZA, ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZW), Eurasian 
patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European 
patent (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, 
IT, LU, MC, NL, PT, SE), OAPI patent (BE BJ, CF, CG, 
CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG). 

Published: 

— With international search report. 



[Continued on next page] 



(54) Title: TURBO DETECTION OF SPACE-TIME CODES 




Coded layered space-time architecture: LST-I 



(57) Abstract: Communication systems which employ multiple transmit and receive antenna-element arrays. Data streams for 
transmission may be interleaved among the transmit antenna elements in order to reduce decision errors. Turbo processing of equal- 
izer output from a number of layers in a layered space-time processing architecture may be employed to reduce decision errors. 
Additionally, space-time equalization may be performed to maximize signal to noise ratio such as via minimum mean square error 
processing, rather than zero forcing, in order to achieve the Shannon limit, reduce multi-path effects and/or reduce intersymbol in- 
terference. Moreover, the receiver can select number and/or identity of receive antenna elements from among a larger group in order 
to optimize performance of the system. 



wo oi/i9oi3 ai iMiMiiiiinnimiiii 



For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



WO 01/19013 



PCT/US00/24641 



TURBO DETECTION OF SPACE-TIME CODES 



RELATED APPLICATIONS 

This document claims priority to and incorporates by reference 
copending provisional USSN 60 / 152,982 entitled "Turbo Space-Time 
Processing to Improve Wireless Channel Capacity" filed on September 9, 
1999. 

FIELD OF INVENTION 

The present invention relates to systems and processes for radio 
communications using multiple-element antenna array technology. 

BACKGROUND 

Turbo processing and space-time equalization are terms that 
comprehend several conventional ways to increase wireless channel capacity. 
Generally, turbo coding and/or processing refers to techniques aimed at 
approaching the Shannon limit in a channel, while space-time processing 
refers to techniques for processing signals from multi-element antenna arrays 
to exploit the multi-path nature of fading wireless environments. 

European patent application no. EP 817 401 A2 published July 1, 1998 
in the name of Foschini discloses the use of a number of processing layers for 
space time processing of signals from multiple-receiver antenna elements. 
There, the transmitter feeds a number of transmitter antenna elements by 
cyclically apportioning segments of the modulated encoded stream of data to 
transmitter antenna elements. At the receiver, a number of receiver antenna 
elements are coupled to a number of processing layers, in order to perform 
the space-time processing. Signal components received during respective 
periods of time over a plurality of the receive antenna elements are formed 
into respective space and time relationships in which space is associated with 
respective transmitter antenna elements. Preprocessing occurs so that a 
collection of signal components having the same space-time relationship 
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forms a signal vector such that particular decoded signal contributions can be 
subtracted from the signal vector while particular undecoded contributions can 
be nulled out of the signal vector. The resulting vector is then supplied to a 
decoder for decoding to reform the data stream. Such conventional systems 
and techniques are further described in documents referred to in the "Detailed 
Description" section of this document 

SUMMARY OF THE INVENTION 

Systems and processes according to the present invention employ a 
number of transmitter antenna elements and a number of receiver antenna 
elements coupled to multiple space-time processing layers in the receiver. In 
the present invention, however, portions of the information stream being 
communicated can be interleaved among transmitter antenna elements such 
as on a random or pseudo random basis; among other things, such 
interleaving decreases decision errors in the space-time equalization process. 
Furthermore, each processing layer preferably includes turbo processing in 
order to feed soft decisions about information being processed back to the 
equalizers. Moreover, space-time equalization processes according to the 
present invention preferably seek to maximize signal to noise ratio rather than 
zero forcing, as well as reduce multi-path effects and intersymbol interference. 
A preferred process uses minimum mean square error processing which 
allows the Shannon limit actually to be achieved. Furthermore, systems and 
processes according to the present invention preferably allow selection of the 
number and identity of receiver antenna elements to which the receiver may 
be coupled in order to optimize performance. 

According to one embodiment of the invention, an information source is 
coupled to provide a plurality of data streams to a plurality of transmit 
antennas, via, for each stream, an encoder, interleaver and symbol mapper. 
On the receiver side, a plurality of M receiver elements are coupled to a 
plurality of processing layers. The number of receiver antenna elements M is 
preferably greater than or equal to the number N of transmit antenna 
elements, since equalization according to the present invention does not 
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require an extra degree of freedom. The M receiver antenna elements are 
coupled to the first processing layer by coupling to a space-time equalizer 
which preferably applies minimum mean square error processing in order to 
maximize signal to noise ratio. The output of the equalizer is applied to a 
deinterleaver, after which the deinterleaved stream is supplied to a decoder in 
the layer. Output of the decoder is provided for output common with the 
output from the other decoders in the other layers. Preferably, each layer also 
includes an interleaver which receives output from the decoder and 
deinterleaver and supplies its interleaved output back to the equalizer in the 
layer in order to provide soft decision making to the equalizer. In successive 
processing layers, output from the decoder of the preceding layer is combined 
with information from the interference canceler of the layer preceding the 
preceding layer (except the second layer, which receives signals from an 
interference canceller which is coupled to the decoder of the first layer and to 
the receive antenna elements). 

According to an alternate embodiment, the deinterleaver, interleaver 
and decoder are shared among layers, so that the equalizer of each layer 
outputs to a deinterleaver common to all layers. The output of the 
deinterleaver may then be coupled to a decoder which again is common to all 
layers. An interleaver may be provided which receives output from the 
deinterleaver and the decoder and applies it to each equalizer for soft 
decisions to be applied to the equalizers. 

Accordingly, components for deinterleaving, decoding and 
reinterleaving may be functionally located in each layer, or common to the 
layers. In the first case, each layer below the first layer processes signals 
from an interference canceller which receives signals from a decoder in the 
preceding layer and from the antenna elements (in the case of the second 
layer) or the interference canceller in the next-preceding layer (in the case of 
other layers). In the second case, each layer below the first processes 
signals from an interference canceller which receives signals from the 
equalizer in the preceding layer and from the antenna elements (in the case of 
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the second layer) or the interference canceller in the next preceding layer (in 
the case of other layers). Such turbo processing architectures can be used in 
connection with layered space-time equalization which relies on zero forcing 
rather than minimum square error processing. They can also be used in multi 
array systems in which the data streams are periodically cycled rather than 
interleaved. 

It is accordingly an object of the present invention to provide improved 
layer space-time processing for communication systems which employ turbo 
processing techniques in order, among other things, to reduce decision errors. 

It is an additional object of the present invention to provide layered 
space-time processing for communication systems which seeks to maximize 
signal to noise ratio, thereby better addressing the Shannon limit, and which 
also addresses mulit-path effects and / or intersymbol interference. 

It is an additional object of the present invention to provide processing 
for communication systems in which data streams may be interleaved rather 
than periodically cycled among transmit antenna elements, in order, among 
other things, to reduce decision errors. 

it is an additional object of the present invention to provide layered 
space-time processing for communication systems in which a receiver can 
select a set of antenna elements, including their number and / or identity, from 
among a larger group of antenna elements in order to optimize performance 
of the system. 

Other objects, features, and advantages of present invention will 
become apparent with respect to the remainder of this document 
BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1(a) is a schematic diagram showing a first embodiment of 
communications systems according to the present invention. 

Fig. 1(b) is a schematic diagram showing a second embodiment of 
communications systems according to the present invention. 

Fig. 2 is schematic diagram showing one form of space-time 
processing according to the present invention. 
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Figs. 3(a) and 3(b) are diagrams which compare performance between 
two coding schemes according to the present invention. 

Fig. 4 is a diagram which shows different capacity bounds for 
processing according to the present invention over a flat Rayleigh fading 
channel. 

Fig. 5 is a diagram which shows simulation results for a system 
according to the first embodiment of the present invention with two transmit 
and two receive antenna elements. 

Fig. 6 is a diagram which shows simulation results for a system 
according to the first embodiment of the present invention with two transmit 
and four receive antenna elements. 

Fig. 7 is a diagram which shows simulation results for a system 
according to the first embodiment of the present invention with two, four and 
eight receive antenna elements, using soft decisions and six turbo iterations. 

Fig. 8 is a diagram which shows simulation results for a system 
according to the second embodiment of the present invention with two, four 
and eight receive antenna elements, using soft decisions and two turbo 
iterations. 

Figs. 9(a) and 9(b) are diagrams which show simulated performance of 
the first embodiment of the present invention using soft decisions and four 
antenna elements for typical urban and hilly terrain profiles. 

DETAILED DESCRIPTION 

The documents and references cited in the following disclosure are 
incorporated herein by this reference. 
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Abstract — By deriving a generalized Shannon capacity formula 
for muitipie-input, multiple-output Rayleigh fading channels, and 
by suggesting a layered space—time architecture concept that at- 
tains a tight lower bound on the capacity achievable, Foschini has 
shown a potential enormous increase in the information capacity 
of a wireless system employing multiple-element antenna arrays 
at both the transmitter and receiver. The layered space-time 
architecture allows signal processing complexity to grow linearly, 
rather than exponentially, with the promised capacity increase. 
This paper includes two important contributions: First, we show 
that Foschini' s lower bound is, in fact, the Shannon bound when the 
output signal-to-noise ratio (SNR) of the space-time processing in 
each layer is represented by the corresponding ''matched filter" 
bound. This proves the optimaiity of the layered space— time 
concept. Second, we present an embodiment of this concept 
for a coded system operating at a low average SNR and in the 
presence of possible intersymbol interference. This embodiment 
utilizes the already advanced space-time filtering, coding and 
turbo processing techniques to provide yet a practical solution 
to the processing needed. Performance results are provided for 
quasi-static Rayleigh fading channels with no channel estimation 
errors. We see for the first time that the Shannon capacity for 
wireless communications can be both increased by N times (where 
TV is the number of the antenna elements at the transmitter 
and receiver) and achieved within about 3 dB in average SNR, 
about 2 dB of which is a loss due to the practical coding scheme 
we assume — the layered space— time processing itself is nearly 
information-lossless! 

Index Terms — Equalization, interference suppression, space- 
time processing, turbo processing. 



I. Introduction 

TURBO" and "space-time" are two of the most explored 
concepts in modern-day communication theory and 
wireless research. From a communication theorist's viewpoint, 
"turbo" coding/processing is a way to approach the Shannon 
limit on channel capacity, while "space-time" processing is 
a way to increase the possible capacity by exploiting the rich 
multipath nature of fading wireless environments. We will see 
through a specific embodiment in this paper that combining the 
two concepts provides even a practical way to both increase 
and approach the possible wireless channel capacity. 

With growing bit rate demand in wireless communications, 
it is especially important to use the spectral resource efficiently. 

Paper approved by K. B. Letaief. the Editor for Wireless Systems of the IEEE 
Communication* Society. Manuscript received September 15. 1999: revised De- 
cember 3. 1999. This paper was presented at the IEEE International Conference 
on Communications, New Orleans. LA, June 2000. 

The author is wirh the Home Wireless Networks. Norcross. GA 30071 USA 
(e-mail: lek@homewireless.com). 

Publisher liem Identifier S 0090-6778(00)071 11-7. 



The basic information iheory results reported by Foschini and 
Gans [1] have promised extremely high spectral efficiencies 
possible through multiple-element antenna array technology. 
In high scattering wireless environments (e.g., troposcatter, 
cellular, and indoor radio), the use of multiple spatially sepa- 
rated and/or differently polarized antennas at the receiver has 
been very effective in providing diversity against fading [2], 
[3]. Receiver diversity techniques also create signal processing 
opportunities for interference suppression and equalization 
( e -g-* [4M6]). However, using multiple antennas at either the 
transmitter or the receiver does not enable a significant gain in 
the possible channel capacity. According to [I], the Shannon 
capacity for a system with 1 transmit and N receive antennas 
scales only logarithmically with xV. as A/ — *• oc. For a system 
using N transmit and 1 receive antennas, asymptotically 
there is no additional capacity to be gained, assuming that the 
transmit power is divided equally among the N antennas. 

Foschini and Gans [1] have shown that the asymptotic 
capacity of multiple-input, multiple-output (MIMO) Rayleigh 
fading channels grows, instead, linearly with :V when N 
antennas are used at both the transmitter and the receiver. 
Furthermore, in [7], Foschini suggested a layered space— time 
architecture concept that can attain a tight lower bound on the 
capacity achievable. In this layered space-time architecture, 
;V information bit streams are transmitted simultaneously 
(in the same frequency band) using .V diversity antennas. 
The receiver uses another jV diversity antennas to decouple 
and detect the N transmitted signals, one signal at a time. 
The decoupling process in each of the N processing "layers" 
involves a combination of nulling out the interference from 
yet undetected signals (N diversity antennas can null up to 
xV — 1 interferes, regardless of the angies-of-arrival [5]) and 
canceling out the interference from already detected signals. 
One very significant aspect of this architecture is that it 
allows an ;V -dimensional signal processing problem — which 
would otherwise be solvable only through multiuser detection 
methods [8] with /rr v complexity (m is the signal constellation 
size) — to be solved with only A/ similar l-D processing steps. 
Namely, the processing complexity grows only linearly with 
the promised capacity. 

This paper includes two important contributions. First, we 
show that Foschini's lower bound is. in fact, the Shannon 
bound when the output SNR of the space-time processing in 
each layer is represented by the corresponding "matched filter" 
bound [6], i.e.. the maximum SNR achievable in a hypothetical 
situation where the array processing weights to suppress the 
remaining interference in each layer are chosen to maximize the 
output signal-to-interference-plus-noise ratio and any possible 



-6- 



WO 01/19013 



PCT/US00/24641 



intersymbol interference (ISO is assumed to be completely 
eliminated by some means of equalization. The ''matched filter" 
bound has been shown to be approachable using minimum 
mean-square error (MMSE) space-time filtering techniques 
[6]. 1 By showing the equivalence of the generalized Foschini's 
bound and the Shannon bound, we essentially prove the 
optimality of the layered space-time concept. 

Second, we present an embodiment of Foschini's layered 
space-time concept for a coded system operating at a low av- 
erage SNR and in the presence of unavoidable IS I. Previously, 
a different embodiment has been provided in [9] for an uncoded 
system with variable signal constellation sizes, operating at 
a high average SNR without ISI. Adding coding redundancy 
might, at first, seem conflicting with the desire to increase the 
channel bit rate. Our justification is as follows: First, we seek 
to enhance the channel capacity from a system perspective. 
We use "noise" in SNR to represent all system impairments, 
including thermal noise and multiuser interference. The ability 
to operate at low SNR's means that more users per unit area 
can occupy the same bandwidth simultaneously. Second, we 
anticipate the use of adaptive-rate coding schemes to permit 
different degrees of error protection according to the channel 
SNR's. Incremental redundancy transmission [10], currently 
being considered for the Enhance Data Services for GSM 
Evolution (EDGE; GSM stands for Global System for Mobile 
Communications) standard, is an efficient way to implement 
adaptive code rates without requiring channel SNR monitoring. 
With such adaptive-rate coding, the system does not "waste" 
spectral resources under good channel conditions. 

Meanwhile, the iterative processing principle used in turbo 
and serial concatenated coding [1 1]-[15] has been successfully 
applied to a wide variety of joint detection and decoding prob- 
lems. One such application is the so-called "turbo equalization" 
[16]-[19J, where successive maximum a posteriori (MAP) 
processing is performed by the equalizer and channel decoder 
to provide a priori information about the transmit sequence 
to one another. Similar to the layered space— time concept, 
turbo processing allows a multi-dimensional (rwo-dimensional 
in this case) problem to be optimally solved with successive 
l-D processing steps without much performance penalty. In 
this paper, we apply the turbo principle to layered space-time 
processing in order to prevent decision errors produced in each 
layer from catastrophically affecting the signal detection in 
subsequent layers. 

We consider two possible coded layered space-time struc- 
tures: one applying coding across the multiple signal processing 
layers, and the other assuming independent coding within each 
layer. Similar to [1 j. we assume a quasi-static random Rayleigh 
channel model, where the channel characteristics are stationary 
within each data block, but statistically independent between 
different data blocks, different antennas, and. in the case of dis- 
persive multipath channels, different paths. The system is as- 
sumed to have similar ISI situations as in EDGE and GSM, 
where multipath dispersions may last up to several symbol pe- 
riods [20]. We show that near-capacity performance is achiev- 

1 In a flat fading case. MMSE array processing achieves exactly the "matched 
filter" bound performance. 
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able using l-D processing and coding techniques that are al- 
ready practical and "legacy-compatible" with the EDGE stan- 
dard, e.g., the use of bit-interleaved 8-ary phase-shift keying 
(8-PSK) with rate- 1/3 convolutional coding and an equalizer 
with a similar length and structure. 

A slightly different layered space-time approach based on 
space-time coding [23], [24] has been studied in [25]. Although 
it is difficult to make a general comparison, we will see later that 
our coded layered space-time approach does by far outperform 
the results reported in [25] for :V = 4 and N - 8. On the 
other hand, for N = 2, space-time coded quaternary phase-shift 
keying (QPSK) without layered processing appears to be the 
best known technique for achieving a spectral efficiency of 2 
bps/Hz. 

This paper is organized as follows. Section II provides a 
brief review of Foschini's layered space-time concept. Section 
III describes the two coded layered space-time architectures 
and presents a capacity analysis which reveals the equivalence 
of a generalized Foschini's lower bound formula and the true 
capacity bound. Section IV provides details on the array pro- 
cessing, equalization, and iterative MAP techniques. Section V 
presents performance results. A summary and conclusions are 
given in Section VI. 

II. Background Theory 

We briefly review the theory behind Foschini's layered 
space-time concept. The generalized Shannon capacity for a 
MIMO Rayleigh fading system with N transmit and M receive 
antennas is given in [ I ] as 

C = log 2 [det(/-^ (1) 

where H is an M x N matrix, the (i, j)th element of which 
is the normalized channel transfer function of the transmission 
link between the jth transmit antenna and the ith receive an- 
tenna, / is the M x M identity matrix, p is the average SNR 
per receive antenna, and det( )and superscript f denote deter- 
minant and conjugate transpose. It is assumed that the transmit 
power is equally divided among the /V transmit antennas. The 
normalization of the channel transfer function is done such that 
the average (over Rayleigh fading) of its squared magnitude is 
equal to unity. 

The lower bound on capacity is provided in [1] as 

C> f; log 2 [l + xL] = Cr (2) 

fc«.V-A/ + l 

where \\ k is a chi-squared random variable with 2k degrees of 
freedom. For M — S 

C F = jr log 2 [l + -£x3 fc ] . (3) 
k= I 

Since yJ* represents a fading channel with a diversity order 
of A:, the lower-bound capacity in (3) can be viewed as the sum 
of the capacities of V independent channels with increasing di- 
versity orders from I to ;V. This suggests a layered space-time 
approach [7] for detecting the .V transmitted signals as follows: 
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In the first layer, the receiver detects a first transmitted signal 
by nulling out interference from N - 1 other transmitted signals 
through array processing. Assuming a "zero forcing" (ZF) con- 
straint, one receive antenna is needed to completely correlate 
and subtract each interference [5]. Thus, the overall process of 
nulling :V - 1 interferences leaves the receiver with .V — („V — 
1 ) = 1 degree of freedom to provide diversity for detecting the 
first signal, i.e., a diversity order of 1 (or simply no diversity). 
Once detected, the first signal is subtracted out from the received 
signals on all N antennas. 

In the second layer, the receiver performs similar interference 
nulling to detect a second transmitted signal- This time, since 
there are only N - 2 remaining interferences, the receiver af- 
fords a diversity order of 2. The detected signal is again sub- 
tracted out from the received signals provided by the fust layer. 

Repeating the above interference nulling/canceling step 
through JV layers, we see that the receiver affords an increasing 
order of diversity from I to N. If the capacities achieved in 
individual layers can be combined in some manner, then the 
layered space-time approach just mentioned will achieve the 
capacity lower bound expressed in (3). We will explore two 
capacity combining possibilities in the next section. 

Note that the capacity and capacity low bound given in 
(1H3) are actually frequency-dependent. We here provide an 
explicit capacity formula for band-limited, frequency -selective 
channels (some variables are redefined to be consistent with 
later analytical development). 

C = <log 2 [det(SH~ 1 )]) (4) 

where, as shown in equations (5M8) at the bottom of the page, 
3i is the frequency-domain correlation matrix of the signals 
on M receive antennas, is the noise power density at 

frequency / on the ;"th receive antenna, T is the symbol period. 



Hij{f) is the channel transfer function (not normalized) of the 
transmission link between the zth transmit antenna and the 
7th receive antenna, and superscripts * and T denote complex 
conjugate and transpose. Note in (7) and (8) that we consider 
the folded spectra - (m/T)) and - (m/T)) of 

the channel transfer function and noise power density, where 
m = - J, . . . , J (J is finite because the signal sources are 
assumed to be band-limited). This is to take into account the 
effect of excess bandwidth and symbol-rate sampling when 
the frequency selectivity of the channel is not symmetrical 
around the Nyquist band edges. Even though we assume white 
Gaussian noise, the noise power density near and outside the 
Nyquist band edges actually attenuates with the receive filter 
transfer function. From our experiment (assuming a square-root 
Nyquist filter with a 50% rolloff factor), the computed capacity 
can be underestimated by as much as 0.5 dB if this attenuation 
is not taken into account. 

in. Coded Layered Space-Time Architectures 
A. Basic Concepts 

We consider two coded layered space-time approaches as 
shown in Fig. 1(a) and (b). In the first approach, named "LST-P 
(LST stands for "layered space-time"), the coded information 
bits are interleaved across the N parallel data streams Xi, 
x 2 , - • •, x,v, where Xi denotes a sequence of complex-valued, 
transmit data symbols (e.g., 8-PSK symbols). The receiver first 
decouples the N data streams through interference nulling/can- 
cellation, as described in Section IL then deinterieaves and 
decodes all the N data streams as one information block. In 
the second approach, %t LST-II/' the information is first divided 

into N uncoded bit sequences ui, u 2 u/v, each of which 

is independently encoded, interleaved, and symbol-mapped to 
generate one of the N parallel data streams. At the receiver, the 
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Fig. I. Coded layered space-time architecture: ia> LST-I and (b) LST-II. 



.V data streams are decoupled and independently deinterleaved 
and decoded. The output of LST-II produces A r information 
blocks at a rate of l/.V times the output rate of LST-I. 

In Fig. 1(a) and (b), "space-time equalizer" refers to a com- 
bined array processing (for interference nulling) and equaliza- 
tion function. Instead of the ZF criterion, we assume that the op- 
timization of the antenna/equalizer weights is based on a MMSE 
criterion, which in general provides better performance than a 
ZF approach. Foschini [7] has also indicated a potential per- 
formance benefit of using MMSE (or "maximum SNR") rather 
than ZF in a layered space-time architecture. Although we show 
M receive antennas in Fig. 1(a) and (b) {M > .V is the suffi- 



cient condition for nulling A r - 1 interference), we only consider 
\f = N in this study. 

Similar to [9J, the underlying assumption of our layered 
space-time architecture is that the receiver can order the detec- 
tions of iV data streams such that an undetected layer always 
has the strongest received SNR. In LST-I, the space-time 
equalizer in each layer must provide data decisions x A(l) (A de- 
notes the permutation due to layer ordering) to the interference 
canceller, since decoding cannot be done until all the layers 
are processed. In LST-II, the interference cancellation in each 
layer can use more reliable data decisions U\( t ) provided by 
the decoder. Thus, LST-I is more prone to decision errors than 
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LST-II. In order to minimize the effects of decision errors, and 
also to improve the joint detection/decoding performance in 
general, we assume the use of turbo processing in our layered 
space-time architecture. As shown in Fig. 1(a) and (b), the 
space-time equalizers and the decoders provide extrinsic soft 
information to one another by subtracting the received soft 
information from the newly computed soft information. Details 
on MMSE space-time equalization and turbo processing will 
be provided in the Section IV. 

B. Capacity Analysis 

Without getting into the detail of all the processing functions, 
we first discuss the general differences between the two coded 
layered space-time approaches. In particular, we are most inter- 
ested in the capacity combining aspects of the two approaches. 

Let SNRfc denote the output SNR of the array processing in 
the kth layer. First, we note that, in LST-II, the capacity of each 
processing layer is bounded by the spectral efficiency R of the 
modulation and coding in each layer, e.g., R = 1 for 8-PSK 
with rate 1/3 coding. Thus, the total capacity of LST-II is given 
by (similar to (l>-(3), we write capacity without showing the 
frequency dependence) 

.v 

Clst-ii = X! min {#> log 2 [l + SNRfc]}. (9) 

Without layer ordering, it is most likely that the overall perfor- 
mance of LST-II will be largely influenced by the error proba- 
bility of the first processing layer with a diversity order of only 
1. In contrast, our simulation results in Section V will show that 
LST-II with layer ordering can actually achieve a diversity order 
of approximately N. 

Since coding is performed across all the processing layers in 
LST-I, the achieved SNR in each layer will contribute to the 
overall layer processing performance. As Foschini [7] indicated, 
such a coding scheme should be able to achieve the capacity 
lower bound in (3). Here, we provide a generalized formula of 
Foschini's lower bound by removing the ZF constraint and in- 
stead using SNRfc as the generalized outpuf SNR. 

.v 

C F =J2 log 2 [l + SNRfc]. (10) 
k=l 

Reference [6] provides output SNR formulas for different 
types of optimum space-time processors. Here, it is of great 
interest to express the capacity lower bound using the best per- 
formance achievable. In the following equation, we represent 
SNR*. in (10) by the "matched filter" bound-the maximum 
achievable SNR by any space-time processing receiver: 

c r.htF = (^ Iog 2 [i + r t (/)]^ (11) 

where is the "matched filter" bound 2 given by equation 

(15) in Section IV-A (simply a rewriting of the result in [61). 

: Noie that the "matched filter" bound usually refers to the integrated SNR 
( r k ( / ) ) over the signal bandwidth (e.g., [6| ). However, in the capacity context, 
we assume the best possible way to exploit the SNR's in all frequency compo- 
nents 
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Fig. 2. Space-time DDFSE with MAP processing 

Note that ( 1 1) is an explicit formula similar to (4); it shows the 
frequency dependence of the output SNR and the integration of 
capacity over the signal bandwidth. Also, we assume that the 
kth layer has k — 1 interferences. 

In the process of analyzing the meaning of ( 1 1 ), we discov- 
ered an identical relationship between (11) and (4) regardless of 
how the layers are ordered. We show the proof in the Appendix 
(this proof is valid even when M ^ .V). Thus, Foschini's lower 
bound (3) is actually the true Shannon capacity bound when 
the output SNR of the space— time processing in each layer is 
represented by the corresponding ''matched filter' bound. This 
proves the optimal ity of the layered space-time concept. 

The capacity analysis presented above is based on the as- 
sumption of perfect layer detection, i.e., no decision errors af- 
fecting the detection in subsequent layers. In reality, LST-I is 
more prone to decision errors than LST-II and layer ordering 
becomes important for both schemes. Our simulation results in 
Section V will demonstrate how decision errors affect the actual 
performance of the two coded layered space-time approaches. 

IV. Signal Processing Functions 
A. Space-Time Equalization 

We consider combined array processing and equalization in 
order to cope with dispersive channels. A space-time equalizer, 
consisting of a spatial/temporal whitening filter, followed by 
a decision-feedback equalizer (DFE) or maximum-likelihood 
sequence estimator, can suppress both ISI and dispersive 
interference [6], The space-time equalizer used in this study 
is shown in Fig. 2. It consists of a linear feedforward filter 

Wj(f), j = 1 A/, on each diversity branch, a combiner, 

symbol-rate sampler, soft-input, soft-output (SISO) MAP 
sequence estimator, and synchronous linear feedback filter 
B "(/). The feedforward filters (Hj(/)} are shown as con- 
tinuous-time filters, but they can be implemented in practice 
using fractionally-spaced tapped delay lines. The combined 
use of a sequence estimator and feedback filter after diversity 
combining is similar to the structure of a delayed decision-feed- 
back sequence estimator (DDFSE) [27]. Thus, we refer to the 
space-time equalizer in Fig. 2 as a "space-time DDFSE." A 
"space-time DFE" is a structure where the sequence estimator 
is replaced by a memoryless hard slicer. 

It has been shown in [201 that a space-time DDFSE with a 
sequence estimator memory of /* and a feedback filter of length 
Lc - /i can be optimized in a MMSE manner as if it was a 
space-time DFE with a feedback filter of length L B - In fact. 
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numerical results in [6] showed that an optimum space-time 
DFE (with unconstrained filter lengths and no feedback decision 
errors) can perform within only 1-2 dB of the ideal "matched 
filter" bound performance. Thus, in order to have a practical 
receiver structure for layered space-time processing, we con- 
sider a space-time DDFSE with a minimum sequence estimator 
memory, i.e., p. = 1. The sequence estimator is used only to 
provide a trellis structure needed for turbo processing, and pre- 
sumably more reliable feedback decisions than the slicer used 
in a space-time DFE. Details on MAP processing will be given 
in Section IV-B. 

We first provide a brief review of the space-time filtering 
theory. Based on the space-time DFE equivalent model de- 
scribed above, the MMSE solution for the feedforward filters 
{Wj(J)} with unconstrained length can be given using the 
results of [6] (see also [28]) 

W = W?H\i\ + B(f)) = R^g; * + (12) 

1 + ± k{f) 

where 

St = ]T H-fff+K (13) 

ir*^(/-i)...^(/-i)... ^ 

■ ■■ w i(f + f) ■■• w "(f + : f)] T < 14 > 

r»(/) = Htt^Hl. (15) 

In the above equations, we assume that there are a total of k 
signal sources and we use Hk to indicate the channel vector 
[see (8)] of the desired signal. The remaining k - 1 signals 
are interferences. B(f) in (12) is the feedback filter of the 
space-time DFE, which, from our assumption of ^ = 1. is 
only "1 tap" longer than B'(f). T k (f) is the signal-to-inter- 
ference-plus-noise power density ratio at frequency /, i.e., 
the "matched filter" bound. B(f) can be determined through 
spectral factorization of 1 4- r k (f). 

Equation (12) indicates that the optimum feedforward filter 
consists of a spatial/temporal filter Sft^Zf", which performs 
prewhitening (»"_[ ( 2 is the whitening filter of interference and 
noise) and matching to the desired channel, followed by a tem- 
poral filter (1 + B(f))/{\ + r*(/)). which is an anticausai 
post-whitening filter for suppressing precursor ISI. 

A filter length analysis of the optimum space-time DFE de- 
scribed above is provided in [6]. We will first consider a fi- 
nite-length realization of the space-time DFE based on the re- 
sults presented there. We assume that the system has similar 
ISI situations as in EDGE and GSM. Namely, using the mul- 
tipath delay profiles specified for EDGE and GSM (see Ta- 
bles I and II). and assuming the same symbol rate of 270.833 
kbaud (T = 3.692 /<s) with Nyquist filtering (partial response 
signaling is used in EDGE and GSM), the ISI lasts up to five 
symbol periods for the hilly terrain (HT) profile in Table II. Ac- 
cording to the empirical filter length formulas in [6], the feed- 



TABLE I 




GSM Typical Urban (TU) Channel Model 




Path Delay (Us) 0.0 0.2 0.5 L6 2.3 


5.0 


Path Power (dB) -3.0 0.0 -2.0 -6.0 -8.0 


-10.0 


TABLE II 




GSM Hilly Terrain (HT) Channel Model 




PathDeiay {\is) 0.0 0.2 0.4 0.6 15.0 


17.2 


Path Power fdB) 0.0 -2.0 -4.0 -7.0 -6.0 


-12.0 



forward filter on each branch should have the following causal 
and anticausai lengths to achieve near-optimum performance 

L A x:K + KiV(p dB /lQ) (16) 

where K is the channel memory, .V is used here to indicate 
the total number of signals, including the desired and interfer- 
ence signals, and pdB is the average SNR in decibels. In our 
case, K = 5, and assume for example that the system has four 
transmit and four receive antennas (N — 4) and the operating 
range of average SNR is around 5 dB (pdB = 5). The required 
filter length, including the center tap, will be Lc + i.4 + 1 ^ 
23.5. This is a highly impractical number, considering that four 
such filters are required, one per each receive antenna. Fur- 
thermore, as mentioned earlier, the optimum feedforward filters 
should be implemented using fractionally-spaced tapped delay 
lines. If a T/2-spaced filters are used, the total number of taps 
will be doubled. Such a space-time system with about 200 co- 
efficients would be nearly impossible to compute in any radio 
link design. 

Faced with such impracticality of an ideal signal processing 
arrangement, we proceed to consider a suboptimum option. 
First, we will use symbol-spaced instead of fractionally- spaced 
feedforward filters. In order to avoid significant performance 
penalties, a channel estimation-based timing recovery algo- 
rithm described in [291 w iU be used to optimize the symbol 
timing and the decision delay of the center tap relative to the 
measured channel impulse response. In principle, such timing 
optimization also allows the DFE to use a feedforward filter 
with a shorter span than the channel memory while achieving 
a reasonable performance [29] . [30]. After experimenting with 
a number of significantly reduced filter length options, we 
decided on the following suboptimum space-time equalizer 
structure. The feedforward filter on each branch has a total 
of nine symbol-spaced taps, which are positioned such that 
L c = L A = -4. The feedback filter has a length of 8. i.e.. 
Lq = 9 with the MAP processor memory p. = 1 included 
(in order to completely cancel postcursor ISI. Lb must be at 
least as large as the channel memory plus the number of causal 
taps in the feedforward filter). The method in [29J is used to 
optimize the symbol timing and the decision delay of the center 
tap as described above. Direct matrix inversion is used to set 
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all the filter coefficients in a standard MMSE linear processing 
fashion [4], [26], [31], assuming perfect channel estimation. 

B. Turbo Processing 

The turbo processing technique used: in this study is also 
based on a standard approach — the reader is referred to the 
rich literature [11H19], [21]-[22], [32]-[35] for a thorough 
treatment of this subject. The space-time equalizer and the 
decoder both performs SISO sequence estimation to compute 
the a posteriori probability ( APP) of the transmit data symbols. 
This sequence estimation is done using the Bahl-Cocke-Je- 
linek-Raviv (BCJR) forward/backward algorithm. In the 
following, we describe the basic principle of the iterative 
detection/decoding process. 

Using the BCJR algorithm, the MAP processor in the 
space-time equalizer with m M states (m is the signal constel- 
lation size, e.g., m = 8 for 8-PSK, and At = 1 in our case) 
computes the APP <P[c;t|y] of the kth coded bit c* based on 
the observation y, where y is the equalizer output sequence 
corresponding to all the data symbols in a received block (see 
Fig. 2). and the a priori information provided by the decoder 
(this is not available in the first "turbo" iteration). The logarithm 
A(cfc) = log c (P[cjk|y]) of this APP can be regarded as the sum 
of two terms 

A(c fc ) = A"(c*) + A e (c fc ) (17) 

where X p {ck) = log c (P[c*]) is the logarithm of the a prior in- 
formation provided by the decoder, and A* (c* ) is called the "ex- 
trinsic" information. In each "turbo" iteration, the space-time 
equalizer subtracts A p (cjt) from the newly computed value of 
A(cjk) to obtain the extrinsic information X e (c k ) [see Fig. 1(a) 
and (b)]. The entire sequence {X'(c k )} is deinterleaved and for- 
warded to the decoder. 

Similarly, the decoder computes the log-APP 
u(ck) = log e (P[c k \{\ e (c k )}]) based on the deinterleaved 
extrinsic information provided by the space-time equalizer, 
and subtract X € (c k ) from it to obtain extrinsic information 
v e (c k ). The extrinsic information is then interleaved and 
forwarded to the equalizer as the new a priori information 
X p (c k ) for the next "turbo" iteration. 

The interleaver considered in this study is a pseudo-random 
interleaver, i.e., we generate a pseudo-random permutation of 
numbers from 1 to L where / is the block length, and then use 
this permutation as a fixed interleaver. 

In combining the branch metric obtained from the equalizer 
output with the branch metric obtained from the soft input 
provided by the decoder, the MAP processor in the space— time 
equalizer must compute the a priori information for each 8-PSK 
symbol x k from the three soft inputs (A p (c 3 fc), A^fcj^ + i), 
A p (c3* +2 ))- We assume that this is done by way of summing 
the three soft inputs as if the three coded bits were transmuted 
from independent sources (these soft inputs are actually not 
independent when conditioned on the observed waveform of the 
entire data burst). This is a suboptimai method, which is known 
to cause a "random modulation" performance degradation 
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effect in bit-interleaved coded modulation [36]-[38]. However, 
this effect can be overcome by iterative decoding [38], which is 
implicit in our turbo space-time processing approach. 

As noted earlier, in LST-I, the space-time equalizer in each 
layer must provide immediate data decisions to be used for inter- 
ference cancellation. Since these decisions are not "protected" 
by coding, they are prone to errors. In this study, we explore 
a soft decision technique to minimize the effect of decision er- 
rors. The optimum soft decision can be computed by averaging 
all the possible transmit symbols weighted by their APP's [39] 

i* = £ sP[x k = x\y) (18) 

x€A' 

where X includes all the complex-valued 8-PSK constellation 
points. Since P[x k = x\y] can be obtained along with the com- 
putation of the APP P[c k \y], this soft decision approach can be 
implemented with nearly no additional cost in complexity. Sim- 
ilarly, we apply the same technique to compute soft decision 
outputs in LST-II. 

V. Performance Results 
A. Performance Criteria and System Assumptions 

We now present performance results of the layered 
space— time concepts described so far. The performance mea- 
sure is the block-error rate (BLER) over Rayleigh fading. 
The results are obtained through Monte Carlo simulation. The 
BLER is averaged over up to 40000 channel realizations. Each 
block contains 400 information bits (before coding). 

In comparing the performance results to channel capacity, 
we follow the convention of a number of previous works (e.g., 
[9], [23]) to compare the computed BLER with the "outage ca- 
pacity" [1J, i.e., the probability that a specified bit rate is not 
supported by the channel capacity. This is a vague comparison, 
since the Shannon limit refers to the highest error-free bit rate 
possible for long encoded blocks but it does not specify how 
long the blocks should be. Nevertheless, such a comparison 
should still be meaningful as long as the block length and BLER 
are specified. This is similar to the way a bit-error rate of 10~ 5 
is commonly used as the "error free" reference for an additive 
white Gaussian noise (AWGN) channel. 

In order to assess the best performance achievable, we as- 
sume that the channel characteristics can be perfectly estimated 
at the receiver. Similarly, the choices of 1-D processing and 
coding techniques are important to deliver the best possible 
performance. We try to optimize these choices while keeping 
them as practical as possible. Except for the use of an-ay 
processing and iterative MAP algorithms, all the radio link 
techniques assumed in this study are "legacy-compatible" with 
the EDGE standard ( note also that vast research interest in turbo 
coding has made simplified MAP algorithms available [21], 
[22] that are not much more complex than the conventional 
Viterbi algorithm). None of these techniques are claimed 
to be optimum. Yet. our results indicate that near-capacity 
performance is achievable when combining them through the 
coded layered space— time architectures. 
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Fig. 3. Performance of bit-interleaved 8-PSK with rate- 1/3 convoiuttonal and 
turbo coding over (a) AWGN channel, and (b) quasi -static flat Rayieigh fading 
channels with .V receive diversity antennas. 

B. Choosing the Coding Scheme 

We consider a bit-interleaved coded modulation scheme 
using 8-PSK with Gray mapping and rate- 1/3 coding. 
Square-root Nyquist filtering with 30% rolloff is assumed at the 
transmitter and receiver. Bit-interleaved coded modulation has 
been shown [36], [37] to outperform traditional trellis-coded 
modulation in fast fading channels (where time diversity can 
be exploited through sufficient interleaving) and it can be 
improved upon by considering a better mapping technique 
that permits a large Euclidean distance without sacrificing the 
maximum Hamming distance of the baseline coding scheme 
[38]. In this paper, though, since quasi-static fading is our 
basic assumption, the code by itself must be able to withstand 
deep fades. In principle, any code that performs well in an 
AWGN channel is considered a good candidate — turbo codes 
are among the strongest candidates that come to mind. 

Fig. 3(a) and <b) provide a performance comparison between 
two rate- 1/3 coding schemes: one using a 64-state convolutional 
code with (octal) generators {G x . G 2 , G 3 ) = (155, 117, 123) 
(the same code as proposed for EDGE [20)) and the other using 
a turbo code with two identical 16-state recursive encoders sim- 
ilar to the scheme originally proposed by Berrou and Glavieux 
[12] (the results here assume generators (G\, G- 2 ) = (23. 31). 
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which appeared to perform slightly better than other genera- 
tors we tested). Both schemes assume the use of bit-interleaved 
8-PSK with Gray mapping. The turbo coding scheme has an 
additional interieaver within the encoder, which uses another 
pseudo-randomly generated permutation. The receiver structure 
is consistent with what we have described so far. Note, however, 
that we assume a minimum number of filter taps (only one feed- 
forward tap per branch and no feedback filter) whenever there 
is no delay spread assumed, although the MAP processor in the 
DDFSE is always used for iterative detection/decoding as de- 
scribed in Section IV-B. For the turbo coding scheme, "one iter- 
ation" means a full cycle of three processes: 1) MAP processing 
in DDFSE; 2) turbo decoding by the first decoder, and 3) turbo 
decoding by the second decoder. 

Fig. 3(a) shows the performance of the two coding schemes in 
an AWGN channel. First, we note that the performance of con- 
volutional coding also benefits from iterative processing. This 
is due to the suboptimai nature of the decoding scheme, i.e.. 
the "random modulation" effect described earlier, which can be 
improved through iterative decoding. Fig. 3(a) shows that most 
of the improvement is achieved within two decoding cycles. For 
turbo coding, the performance still improves even after five iter- 
ations, but saturates quickly after ten iterations. At 10" 3 BLER 
(approximately equivalent to 10~ 5 bit-error rate), turbo coding 
outperforms convolutional coding by about 2.2 dB, and the re- 
quired SNR is within only 2.4 dB of the 0-dB Shannon limit for 
a spectral efficiency of 1 bps/Hz (8-PSK with rate- 1/3 coding). 

However, when we look at the average BLER performance 
over quasi-static flat fading channels in Fig. 3(b), the benefit 
of turbo coding (with ten iterations) over convolutional coding 
(with two iterations) is reduced to only about 0.5 dB at any value 
of the average SNR and for all the assumed numbers of receive 
diversity antennas. This is not surprising for two reasons: First, 
it is well known that the average BLER is determined mostly 
by the probability of fading events that results in high BLER's. 
If we look at the relative performance at a BLER of, say, above 
1 0% in Fig. 3(a), the difference between the two coding schemes 
is indeed less than 1 dB. Second, the performance of convolu- 
tional coding over fading channels is already within about 2 dB 
of the capacity bound — the capacity bound in this case is de- 
Fmed as the probability that the combined output SNR of all 
diversity branches is below the 0-dB Shannon limit. Thus, there 
is not much room for further improvement. 

Based on the fact that the performances of the two coding 
schemes are quite similar in quasi-static fading channels, we 
will only consider convolutional coding in the remainder of this 
paper. 

C. Layered Space-Time Performance 

We first look at the performance of the two coded layered 
space-time approaches over a flat Rayieigh fading channel. 
Fig. 4 shows the different capacity bounds for this channel, 
assuming .V = 2. 4. and 8. where .V is the number of transmit 
and receive antennas. Again, although we plot the results as 
"block error rate." the capacity bound is defined as the proba- 
bility that the specified spectral efficiency R (R = .V in this 
case) is not supported by each of the differently defined channel 
capacities. C denotes the Shannon capacity bound given by (4) 
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receive antennas (.V = 2). Quasi-static flat Rayleigh fading channel. 



[which is equivalent to the generalized Foschini bound C>, .\/jr 
in (11)], Cf denotes the original Foschini bound (with the ZF 
constraint) in (3), and Clst-ii is the capacity bound for LST-II 
in (9) (however, the results for Clst-ii are obtained simply 
by averaging the probability that R = 1 is not supported by 
each processing layer). Note that C L s T -ir can indeed provide 
approximately a diversity order of N\ this is attributed largely 
to the use of layer ordering as discussed earlier. Note also that 
all the bounds show an improvement with increasing N. This 
means that the capacity actually increases more than linearly 
with the number of transmit and receive antennas. However, 
there is a diminishing improvement as N increases to a much 
larger number. 

Fig. 5 shows the simulation results for LST-I with 2 transmit 
and 2 receive antennas (i.e., N — 2). Three sets of results are 
provided, assuming: 1) soft decisions; 2) hard decisions; and 3) 
correct decisions in each layer (note that the DDFSE always uses 
tentative decisions and provides soft outputs to the decoder). 
We see that, although soft decisions offer some improvement 
over hard decisions, the impact of decision errors is still quite 
noticeable. Fig. 6 shows similar results for N = 4. Here, the 
impact of decision errors is not as significant as the previous 
results, and turbo processing and soft decisions help to reduce 
much of this impact. With three iterations, the effect of decision 
errors almost completely disappears when using soft decisions. 
Decision errors have a lesser effect for a larger N because of the 
greater diversity order available through array processing and 
layer ordering. 

In Fig. 7, we compare the results using soft decisions and six 
"turbo" iterations with the Shannon capacity bound. For N = 4 
and 8, the performance of LST-I is within 2.5-3 dB of the ca- 
pacity bound at 10% BLER (and about 3-3.5 dB at 1% BLER). 
Since the BLER may vary as a function of the block size, 3 it 
is also important to consider the processing loss by discounting 
the loss due to the inefficiency of modulation and coding. As 

3 As an example, when we double the block size, the required average SNR 
is 0.2-0.4 dB greater than the results shown here. However, this difference in 
average SNR applies uniformly to alt results, with or wtthout layered space-time 
processing. 
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Fig. 6. Layered space-time performance of LST-I with four transmit and four 
receive antennas (.V = 4). Quasi-static flat Rayleigh fading channel. 



co 10 

DC 

k. 

2 

UJ 

1 10' 2 
CD 



10" 







_ — 


— ' — ■ 1 J- 1 - 1 — 1 : 

- — LST-r : 

— - -c : 


L vv = 4 <Z 














/ : 














\ , \ , 





0 5 10 15 

Average SNR (dB) 



20 



Fig. 7. Layered space-time performance of LST-I for .V = 2, 4. and S. Soft 
decisions. 6 iterations. Quasi-static flat Rayleigh fading channel. 



shown in Fig. 3(b), there is already a gap of about 2 dB between 
the performance of our coding scheme and the Shannon limit. 
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Thus, the actual loss due to layered space-time processing is 
only about 0.5-1 dB at 10% BLER. 

Fig. 8 shows the performance of LST-II for N = 2, 4, and 
8, compared to the Shannon and Clst-ii bounds. Soft deci- 
sions and two "turbo" iterations are assumed. (In this case, we 
found the effect of decision errors to be marginal, i.e., the re- 
sults with hard and correct decisions were generally within I 
dB of the results shown here. Also, we found turbo processing 
with more than two iterations to provide little improvement.) 
At 10% BLER, the performance of LST-II is 2.5 dB from the 
Clst-ii bound, and 3.5 dB from the Shannon bound. At 1% 
BLER, however, the loss compared to the Shannon bound can 
be as much as 6 dB. 

From the above results, we conclude that, for N = 4 and 
8, LST-I outperforms LST-II by a margin of 0.5 dB (at 10% 
BLER) to 3 dB (at 1% BLER). For N = 2, the performance of 
LST-I is greatly affected by decision errors (note that, even in 
this case. LST-I still performs as well as LST-II at 10% BLER), 
whereas LST-II can reach a lower BLER at high average SNR. 
Based on these results, the layered space-time approach is 
not highly recommended for /V = 2. As mentioned earlier, 
space-time coding is a better alternative to achieve a spectral 
efficiency of 2 bps/Hz. For instance, a 64-state space-time 
coded QPSK can perform to within 2 dB of the Shannon 
capacity bound [23]. 

D. Frequency-Selective Channels 

Finally, we present an example of performance results for fre- 
quency-selective fading channels. This example assumes N = 4 
and the use of soft decisions for both LST-I and LST-II. Fig. 9(a) 
and (b) show the results for the TU and HT profiles, defined in 
Tables I and II. Again, we only show results with two turbo" it- 
erations for LST-II because little improvement can be achieved 
with more iterations in this case. For both delay profiles, the per- 
formance at 10% BLER is within 3 dB of the Shannon bound 
for LST-I with six iterations, and within 4 dB for LST-II with 
two iterations. At a lower BLER. the loss relative to the bound 
is greater for HT than for TU. This is due to the limitation of 
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the suboptimum space-time equalizer structure we assume, as 
already discussed in Section IV-A. 



VI. Conclusion 

By deriving the generalized Shannon capacity formula and 
suggesting a layered space-time architecture that anains a tight 
lower bound on the capacity achievable. Foschini has laid a 
significant theoretical foundation for improving the wireless 
channel capacity through multiple-element array technology. 
We have shown that Foschtni's lower bound is actually the 
true Shannon bound when the output SNR of the space-time 
processing in each layer is represented by the corresponding 
"matched filter" bound. We then provided two coded layered 
space-time approaches as an embodiment of this concept. For 
a large number of transmit and receive antennas, coding across 
the layers provides a better performance than independent 
coding within each layer. However, with two transmit and two 
receive antennas, the former is heavily affected by decision 
errors and. therefore, provides a poorer performance than the 
latter. 

The underlying coding and signal processing techniques used 
in this study are based on practical but suboptimai approaches. 
Yet, such suboptimaiity can be greatly compensated tor by it- 
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erative processing. Overall, our coded layered space-time ap- 
proaches can achieve a performance within about 3 dB of the 
Shannon bound at 10% BLER. about 2 dB of which is a loss 
due to the practical coding scheme we assume. Thus, not only 
is the layered space-time architecture exactly what the Shannon 
limit has prescribed in a theoretical sense, but it also provides 
an attractive general methodology for improving and achieving 
the wireless channel capacity. 

APPENDIX 

Proving the Equivalence of Foschini Bound and 
Shannon Capacity 

Using the mathematical induction method, we will prove that 
(4) and ( 1 1) are identical. In order to do so. we must show that 



det(»K^) = n( l + r *(/» 



(19) 



where R in the above equation is equivalent to R^v defined in 
( 1 3). Again, we assume that the kth layer has k - 1 interferences. 
Note also that the proof provided here is independent of the 
number of receive antennas M (the dimension of R) and the 
way the layers are ordered. 

We start by assuming 1 signal source and Af receive antennas. 
It can be easily shown [1] that 

det(R t K- 1 ) m 1 + JTf *r l JTI = 1 + rx(/). (20) 

Next, we assume that (19) is true for the case of n - 1 signal 
sources, i.e., 

det(R«-iir l ) = Y[(l + T fc (/)). (21) 

We then show in the following that, given (21), (19) is also true 
for n signal sources. 

First, we note from (13) that 



Rn = Rn-l + H^H** 



(22) 



Using the matrix inversion lemma [4, Appendix D], we can 
show that [similar to ( 12)] 



*" i + r„(/r 



(23) 



It follows that 



,w> 4,-1, rr. IT* _ KXn-ilffii _ (Xn-ltt l )~'#n 

' " ™ Hn ~ T+tm) ~ i+r.(/) ■ 

(24) 



For convenience, let 

A = K„K" 1 and B = S n -iH _l . 
We can rewrite (24) as 

B- k ir n 



" " i + r„(/) 



(25) 



(26) 



Furthermore, using the matrix identity (40] 

= M) (27) 

where A^j is called the adjugate matrix of matrix A, we can 
rewrite (26) as 

det{A) Un det(B) 1+T n (/)- (28} 
By replacing det(B) in the above equation using (21 ), we obtain 

det(i4) JL 

n< i+ w» 

kml 

Our goal is to prove that 

n 

det(yl) = U(l + r*(/)). (30) 

kml 

Thus, given (29), we must show that 

A^H' n = B*iH' n . (31) 
From (22) and (25). we have 

where bj is the jth column vector of B, for j = 1, ■ ■ ■ , Af ; 
for convenience, we use Af here to indicate the overall receive 
diversity order, including the effects of both multiple antennas 
and excess bandwidth. 

We now prove (31) by showing that the jth element 
of A^H* n is equal to the jth element of B^H^. for 
j = I, . . . t Af. Note that the jth element of A^H^ is given 
by det( Aj ), where Aj is obtained by replacing the jth column 
of A by H* n . Similarly, the jth element of B»djB^ is given by 
det(Bj ;), where Bj is obtained by replacing the jth column of 
Boy*;. 

Using (32) and the linear properties of determinants, we can 
show that 

jth 

detfZj) = det ]b x + B;^I^, • - , C0 2rT, ] 

(33) 

= det[6 l1 ..fr;, i 

+det [^w - ^ ] (34) 

-dutfe. - .HI. ]. 
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Similarly, we can expand the result of (34) with respect to the 
second column, the third column, and so on (except for the jth 
column). Eventually, we obtain 

jth 
column 

det(3j) = det[6i, 6 2 , H m n , ■ • • . b M ] = det(Bj) 

(35) 

which proves (31). The proof of ( 19) is therefore complete. 
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According to the present invention, the receiver can select a set of 
antenna elements, including their number and / or identity, from among a 
larger group of antenna elements in order, among other things, to improve 
performance of the system without increasing the extent of radiofrequency 
circuitry. One process for selecting antenna elements is to utilize equation 4 
above as a measure of quality for the particular set of antenna elements being 
evaluated. That evaluation can occur for each permutation or combination of 
antenna elements in order to select the subset with optimum performance (as 
determined, for instance, by selecting the subset with greatest value 
calculated according to equation 4). This can occur at whatever desired 
points in time, including periodically. 
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CLAIMS 

What is claimed is: 

1 . A radio receiver coupled to a plurality of receiver antenna elements, 
comprising a plurality of layers for processing signals received by the receiver 
elements, each layer comprising: 

a space-time equalizer, the space-time equalizer in a first layer of the 
plurality of layers coupled to each of the receiver elements, the space-time 
equalizer in each of the other layers of the plurality of layers coupled to an 
interference canceller which receives output from the layer preceding each 
said other layer. 

2. A receiver according to claim 1 further comprising a deinterieaver 
coupled to each space time equalizer and to a decoder for output. 

3. A radio receiver according to claim 1 in which each layer comprises its 
own deinterieaver and decoder, and further comprises an interleaver adapted 
to receive output from the decoder in said layer and from the deinterieaver in 
said layer, the interleaver feeding output to the equalizer in said layer, thereby 
allowing soft decisions about information being processed iteratively by said 
layer to be fed back and forth between said equalizer and said decoder in said 
layer. 

4. A radio receiver according to claim 3 in which the interference canceller 
which receives output from the layer preceding each said other layer receives 
information from the decoder and the interference canceller in the preceding 
layer. 

5. A radio receiver according to claim 1 in which the equalizer in each 
layer is connected to a common deinterieaver, which feeds a common 
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decoder, and in which a common interfeaver receives signals from the 
decoder, and is coupled to each equalizer in order to provide interleaved 
signals to the equalizer. 

6. A radio receiver according to claim 5 in which the interference canceller 
which receives output from the layer preceding each said other layer receives 
signals from the equalizer and from the interference canceller in said layer 
preceding each said other layer. 

7. A radio receiver according to claim 1 in which each space-time 
equalizer performs equalization using a minimum mean-square error criterion. 

8. A radio receiver according to claim 1 in which the receiver is adapted to 
receive and process signals from a transmitter that is coupled a plurality of 
transmit antenna elements, the number of transmit antenna elements to which 
the transmitter is coupled being less than the number of receiver antenna 
elements to which the receiver is coupled. 

9. A radio receiver according to claim 8 in which the receiver is adapted to 
receive and process signals from a transmitter that is coupled to N transmit 
antenna elements, the number of receiver antenna elements to which the 
receiver is coupled is M, and M is greater than N. 

10. A radio receiver according to claim 9 adapted to be coupled to at least 
one set of M receiver antenna elements out of K available receiver antenna 
elements. 

11. A radio receiver according to claim 1 in which the receiver is adapted to 
select the sequence in which information from the receiver antenna elements 
is to be processed. 
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12. A communications system, comprising: 

a radio transmitter coupled to a stream of information and to a plurality 
of transmit antenna elements, said transmitter adapted to apportion a portion 
of the stream of information to each transmit antenna element by interleaving 
said portions of the information stream among the transmit antenna elements; 
and 

a radio receiver coupled to a plurality of receiver antenna elements, 
comprising a plurality of layers for processing signals received by the receiver 
elements, each layer comprising: 

a space-time equalizer, the space-time equalizer in a first layer of the 
plurality of layers coupled to each of the receiver elements, the space-time 
equalizer in each of the other layers of the plurality of layers coupled to an 
interference canceller which receives output from the layer preceding each 
said other layer. 

13. A system according to claim 12 further comprising a deinterleaver 
coupled to each space time equalizer and to a decoder for output. 

14. A system according to claim 12 in which each layer comprises its own 
deinterleaver and decoder, and further comprises an interleaver adapted to 
receive output from the decoder in said layer and from the deinterleaver in 
said layer, the interleaver feeding output to the equalizer in said layer, thereby 
allowing soft decisions about information being processed iteratively by said 
layer to be fed back and forth between said equalizer and said decoder in said 
layer. 

1 5. A system according to claim 14 in which the interference canceller 
which receives output from the layer preceding each said other layer receives 
information from the decoder and the interference canceller in the preceding 
layer. 
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16. A system according to claim 12 in which the equalizer in each layer is 
connected to a common deinterleaver, which feeds a common decoder, and 
in which a common interleaver receives signals from the deinterleaver and 
decoder, and is coupled to each equalizer in order to provide interleaved 
signals to the equalizer. 

17. A system according to claim 16 in which the interference canceller 
which receives output from the layer preceding each said other layer receives 
signals from the equalizer and from the interference canceller in said layer 
preceding each said other layer. 

18. A system according to claim 12 in which each space-time equalizer 
performs equalization using a minimum mean-square error criterion. 

19. A system according to claim 12 in which the number of transmit 
antenna elements to which the transmitter is coupled is less than the number 
of receiver antenna elements to which the receiver is coupled. 

20. A system according to claim 12 in which the transmitter is coupled to N 
transmit antenna elements, the receiver is coupled to M antenna elements, 
and M is greater than N. 

21 . A system according to claim 20 in which the receiver is adapted to be 
coupled to at least one set of M receiver antenna elements out of K available 
receiver antenna elements. 

22. A system according to claim 12 in which the receiver is adapted to 
select the sequence in which information from the receiver antenna elements 
is to be processed. 

23. A communications system, comprising: 
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a radio transmitter coupled to a stream of information and to a plurality 
of transmit antenna elements, said transmitter adapted to apportion a portion 
of the stream of information to each transmit antenna element by interleaving 
said portions of the information stream among the transmit antenna elements; 
and 

a radio receiver coupled to a plurality of receiver antenna elements, 
comprising a plurality of layers for processing signals received by the receiver 
elements, each layer comprising: 

a space-time equalizer, a deinterleaver and a decoder, the space-time 
equalizer in a first layer of the plurality of layers coupled to each of the 
receiver elements, the space-time equalizer in each of the other layers of the 
plurality of layers coupled to each of the receiver elements and to an 
interference canceller which receives output from the decoder in the layer 
preceding each said other layer, each space-time equalizer coupled to the 
deinterleaver in its layer, each deinterleaver coupled to the decoder in its 
layer; 

the equalizer in each layer adapted to perform minimum mean-square 
error equalization to signals being processed; and 

an output for said stream of information coupled to the decoders in 
each of said layers. 

24. A system according to claim 23 in which the transmitter is adapted to 
interleave portions of the information stream among the transmit antenna 
elements randomly. 

25. A system according to claim 23 in which the transmitter is adapted to 
interleave portions of the information stream among the transmit antenna 
elements pseudo-randomly. 
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26. A system according to claim 23 in which the number of transmit 
antenna elements to which the transmitter is coupled is less than the number 
of receiver antenna elements to which the receiver is coupled. 

27. A system according to claim 26 in which the transmitter is coupled to N 
transmit antenna elements, the receiver is coupled to M antenna elements, 
and M is greater than N. 

28. A system according to claim 27 in which the receiver is adapted to be 
coupled to at least one set of M receiver antenna elements out of K available 
receiver antenna elements. 

29. A system according to claim 23 in which the receiver is adapted to 
select the sequence in which information from the receiver antenna elements 
is to be processed. 

30. A process for communicating an information stream using a radio 
transmitter and a radio receiver, including: 

a. coupling a radio transmitter to the information stream and to a 
plurality of transmit antenna elements, and interleaving portions of the 
information stream among the transmit antenna elements; 

b. transmitting said portions of the information stream; 

c. coupling a radio receiver to a plurality of receiver antenna 
elements to receive the transmitted information stream; and 

d. processing the information stream in a plurality of processing 
layers, comprising: 

(i) in a first layer, coupling the receiver antenna elements to 
an equalizer and space-time processing the signals from the receiver antenna 
elements in the equalizer; 

deinterleaving the output from the equalizer; 

decoding the deinterleaved output from the equalizer; 
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feeding the decoded information to a common output for 
the information stream; and 

(ii) in each successive layer, coupling to an equalizer the 
output from an interference canceller that is fed by output from the preceding 
layer and space-time equalizing said output from said interference canceller; 

deinterleaving the output from the equalizer; 

decoding the deinterleaved output from the equalizer; 

and feeding the decoded information to a common output 
for the information stream. 

31 . A process according to claim 30 in which steps of deinterleaving the 
output from the equalizer, decoding the deinterleaved output from the 
equalizer and feeding the decoded information to a common output for the 
information stream are performed inside each layer and further comprising, in 
each layer, reinterleaving deinterleaved and decoded output from said layer, 
and feeding the reinterieaved signal to the equalizer in said layer thereby 
allowing soft decisions about information being processed iteratively by said 
layer to be fed back and forth between said equalizer and said decoder in said 
layer. 

32. A process according to claim 30 in which the equalizer in each layer 
performs minimum mean-square error equalization. 

33. A process according to claim 30 in which said interleaving is performed 
randomly. 

34. A process according to claim 30 in which said interleaving is performed 
pseudo-randomly. 
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35. A process according to claim 30 in which coupling of said receiver to 
said receiver antenna elements includes selecting a set of receiver antenna 
elements from a larger group of receiver antenna elements. 

36. A process according to claim 30 in which coupling of said receiver to 
said receiver antenna elements includes coupling to a set of receiver antenna 
elements selected from a larger group of receiver antenna elements. 

37. A process according to claim 36 in which said coupling further includes 
selecting the sequence in which signals from receiver antenna elements are 
to be processed. 
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Fig. 2. Space-time DDFSE with MAP processing 
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Fig. 3. Performance of bit- interleaved 8PSK with rate- 1/3 convohitional 
and turbo coding over (a) AWGN channel, and (b) quasi-static flat Rayleigh 
fading channels with N receive diversity antennas. 
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Fig. 4. Capacity bounds for quasi-static flat Rayleigh fading channels with N 
transmit and N receive antennas. C : Shannon capacity, C F : Foschini (original) 
bound, and C LST4I : capacity bound for LST-II. 
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Fig. 5. Layered space-time performance of LST-I with 2 transmit and 
2 receive antennas (N= 2). Quasi-static flat Rayleigh fading channel. 




Fig. 6. Layered space-time performance of LST-I with 4 transmit and 
4 receive antennas (N= 4). Quasi-static flat Rayleigh fading channel. 
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Fig. 8. Layered space-time performance of LST-II for N = 2, 4, and 8. 
Soft decisions. 2 iterations. Quasi-static flat Rayleigh fading channel. 
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Fig. 9. Layered space-time performance of LST-I over frequency-selective 
channels: (a) TU profile, (b) HT profile. JV = 4. Soft decisions. 
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